feat: /idd-edit runtime enforcement (R4 + R5) via extracted helper script (#154)#159
feat: /idd-edit runtime enforcement (R4 + R5) via extracted helper script (#154)#159kiki830621 wants to merge 3 commits into
Conversation
Verify Report — PR #159 (Issue #154)Engine5 general-purpose Agents (Claude reviewers, file-based output) + Codex (gpt-5.5 xhigh, run_in_background) = 6-AI ensemble, all 6 returned with findings. Process Gaps3 initial reviewer Agents (logic / regression / devils-advocate) hit transient API rate limit mid-execution; Step 2.5b retry with FULL context re-paste recovered all 3 successfully (fresh Agent spawn). No coordinator self-review fallback needed. Final ensemble = 5 Claude + 1 Codex = full 6-AI. Aggregate🔴 FAIL — DO NOT MERGE
Scope coveragePR refs: #150, #153, #154, #155, #156, #157, #158 Findings (merged, deduped, severity = max across sources)
What works (independently verified)
Scope check✓ No scope creep at file-list level (all changes within Plan locked surface). Sister-bug filing observation (Devil's Advocate finding #8)6 sister/tangential issues filed (#155 / #156 / #157 / #158 / #160 / #161) for known-deferred items. 0 issues filed for the 8 reviewer-found bugs in this PR's own code. The IC_R011 file-by-default pattern is structurally optimized for filing "things found during implementation" not "things broken by implementation". Reviewer-found bugs in active PR fall outside the IC_R011 capture surface — they belong in the fix loop. RecommendationFix loop required. Specifically: Pre-merge fixes (~1-2 hours, blocking):
Follow-up acceptable (file new issues):
Re-verify after fix: Engine note (verify discipline)This verify cycle's 3-Agent rate-limit recovery is itself a useful signal: under transient API throttling, the skill's Step 2.5b retry path with FULL context re-paste works as designed. No coordinator self-review fallback was needed. Cycle now has full 6-AI ensemble coverage. The substantial finding count is not a false-positive of degraded engine — it's real bugs that 6 independent AIs converged on (each BLOCKING bug confirmed by 2+ sources). Next
|
…ript Closes (Refs #150 — literal-letter form per CLAUDE.md commit conventions §"引用 trap pattern 作反例的寫作紀律") Requirements 4 + 5 runtime enforcement deferral, addressing R1/R2/R3 bash-incremental failure on PR #153 (3 verify iterations each introduced new parser bugs). D1: PR path (this branch) D2: Q4 errata flow → refuse-with-helpful-message (NOT auto-override), aligns with IC_R007 user-authored-intent spirit D3: #156 test framework → ad-hoc shell test runner this PR + #156 generalizes later Mid-impl pivot per .claude/scripts/tests/spectra-archive-post-ic/ precedent: extract parser to .claude/scripts/idd-edit-helper.sh (proper extracted helper) instead of inline SKILL.md bash (which is what kept breaking in R1/R2/R3). .claude/scripts/idd-edit-helper.sh — 3 subcommands: - parse-args: positional shift over 7 flags + missing-value guards (R3 C1/C2) + eq-form support + body-file readability (R3 H1) + R4 gate (R4) + override-pair guard. Emits eval-friendly KEY=value via printf %q. - validate-target: single gh API call, *[bot] allowlist + OWNER passthrough + override pathway, R5 refuse exit 4 with actionable message. - section-replace: awk-getline pattern (BSD/gnu safe, closes R3 C3 BSD awk -v multi-line newline reject). .claude/scripts/tests/idd-edit/ — fixture-dir test runner with 13 fixtures covering R1/R2/R3 regression set + R4/R5 gate cases. All 13 GREEN. SKILL.md changes: - Frontmatter argument-hint reflects new flag syntax - Step 1 + Step 2 replaced with helper script invocations - Step 4 --replace mode uses section-replace helper - ## 使用範例: 3 examples updated + 2 new (section-replace + errata override) - ## Batch mode: per-target R4/R5 note + #158 cross-link idd-comment/SKILL.md errata Template SPECIAL BEHAVIOUR: IDD_CALLER env var pattern + R5 refuse exit 4 handling + helpful message suggesting manual --override-user-content. openspec/specs/append-vs-modify-discipline/spec.md: Purpose + R4 + R5 preambles updated from 'deferred to #154' to 'landed via #154' + specific helper subcommand + tested-by fixture refs. plugin.json: 2.74.0 → 2.75.0. precedent) Refs #154 Refs #150 Refs #155 Refs #156 Refs #157 Refs #158
… 5 HIGH + D1 discipline) Addresses all blocking findings from 6-AI verify on PR #159 Round 1: # Path + integration fixes (production-breaking) C1: Helper + 18 fixtures moved from .claude/scripts/ to plugins/issue-driven-dev/scripts/ (matches process-attachments.sh precedent). Closes 'No such file' error on every production install. C2: SKILL.md Step 1 enforces [[ COMMENT_ID =~ ^[0-9]+$ ]] before any URL/filename substitution. Closes path traversal via gh api + /tmp/idd-edit-repl-$COMMENT_ID.md. B1: SKILL.md Step 4 --append uses $BODY_INPUT (helper-exported) not undefined $APPEND_BODY. Closes append silent-drop regression. B2: SKILL.md Step 6/7 uses $REPO (helper-exported, respects --repo flag) not undefined $GITHUB_REPO. # Security fixes (audit forgery + eval pollution) C3+M1: New emit-audit-marker helper subcommand centralizes HTML-escape (`-->` collapsed to `-\>`, control chars stripped). All 3 modes use it for both edit + override markers. Closes audit forgery via $REASON / $SECTION_FLAG injection + R5 forensic gap (override marker now emitted in --append + --prepend-note + --replace). H2: SKILL.md Step 1 splits parse-args stdout/stderr via temp file. eval only sees printf %q quoted assignments; stderr (potentially containing $()) never reaches eval. Closes shell-injection POC. # Correctness fixes H1: helper R4 gate validates --scope value MUST be 'whole-comment' (no other valid scopes today). Invalid value → exit 3 with hint. H3: IDD_EDIT_HELPER_GH_MOCK env var added to validate-target subcommand. 4 new fixtures (15-18) exercise OWNER passthrough / bot allowlist / non-OWNER refuse / override pathway. M6 (null author guard) + M3 (dead-code bot allowlist) also addressed. H4: section-replace heading-level counter rewritten via awk char-by-char (no wc -c trailing-newline off-by-one). CRLF strip via tr -d '\r' on both input + replacement. Fixture 08 strengthened with exact-stdout match. New fixture 19 verifies CRLF. H5: SKILL.md 鐵律 documents --body-file path-traversal risk. # Discipline D1: This commit body uses code-fence wrap around close-keyword examples + literal-letter form for any close-keyword + #digit references. See plugins/issue-driven-dev/CLAUDE.md Commit Conventions for the full pattern. # Test gate proof All 19 fixtures pass: bash plugins/issue-driven-dev/scripts/tests/idd-edit/test.sh → Results: 19 passed, 0 failed # Tests added in Round 2 - 14-replace-scope-invalid (H1 invalid scope value) - 15-validate-owner-passes (H3 mock OWNER) - 16-validate-bot-passes (H3 mock bot) - 17-validate-non-owner-refuses (H3 mock CONTRIBUTOR) - 18-validate-non-owner-with-override (H3 mock + override) - 19-section-replace-crlf (H4 CRLF input) Refs #154 Refs #150 Refs #97
ce0adf4 to
423a07c
Compare
Verify Report — PR #159 ROUND 2 (Issue #154)Engine5 general-purpose Agents + Codex (gpt-5.5 xhigh) = 6-AI ensemble, all 6 returned (no rate-limit recovery needed this round — clean first-attempt run). Aggregate🔴 FAIL — DO NOT MERGE (R2 introduced new bugs while fixing R1)
Round 1 fix status (per Codex independent verification)
11 R1 findings: 7 PASS / 4 PARTIAL. PARTIAL = fix incomplete or merely documented; underlying threat surface remains. NEW Findings (introduced by R1 fix loop)
Meta-Findings (DA layer)
Test gate (still GREEN — but misleading)Important: all 19 fixtures pass. But the bugs above are in the SKILL.md↔helper integration layer (H7 batch loop), the helper's escape contract (H6 Scope checkNo file-list scope creep. But architectural scope inflation: the fix-loop pattern is escalating fixture count without addressing the test infrastructure gap (#156 was deferred). RecommendationUser's options at this verdict moment: Path (a): R3 strict-scope (~30 min, recommended by DA)
Path (b): Continue fix-all-in-one (~1-2h, follows feedback_verify_fix_same_pr)
Path (c): Escalate to #155 (alternative-layer)
Path (d): Merge as-is + file follow-ups
Engine noteR2 had no Agent rate-limiting (unlike R1's 3/5 retry need). All 6 reviewers produced findings on first attempt. The 4-way convergence on H6 quote injection is the strongest verify signal we've seen this session — strongly suggests the bug is real and not a single-reviewer over-read. Plan R6 mitigation triggered: "If [later] verify iteration introduces yet another bug, revert + escalate to #155 alternative-layer discussion." At R2 (not R4), but the pattern emerged earlier than expected — the fix loop introduces ~50% the new bug count vs what it fixes (R1: fixed 11; R2 introduces 4-5 new HIGH). Continued fix loops won't converge. NextUser decides path (a) / (b) / (c) / (d). Re-verify after fix. |
…luence) Addresses all R2 verify findings on PR #159 + bonus M3/M6 cleanup. Round 3 of #154 fix loop. Per user choice (path b: fix-all-in-one). # Security (HIGH) H6 (4-way confluence — Logic + Security + DA + Codex independently surfaced): emit-audit-marker now escapes " → " before embedding in marker attributes. Closes quote injection that allowed forging audit attributes via --reason='ok" date="1970-01-01" forged="yes'. Bash gotcha: treats & as matched-text back-reference in replacement; required \" escape. Also closes latent H7 (key sanitization). H7 (regression): SKILL.md Step 1 for-loop restructured. Previously closed BEFORE Steps 1.5-7, silently processing only LAST target in batch mode. Now accumulates RESOLVED_COMMENT_IDS array; explicit 'Per-target outer loop' subsection wraps Steps 1.5-7 per resolved comment ID. H8 (security): IDD_EDIT_HELPER_GH_MOCK env var now requires IDD_EDIT_HELPER_TEST_MODE=1 paired. Closes R5 author bypass via attacker-crafted env (classic LD_PRELOAD pattern). test.sh auto-sets both for mock fixtures; fixture 21 deliberately tests the gate refuses without TEST_MODE. H10 (security): validate_body_file_path() helper refuses sensitive absolute paths (/etc/* /var/* /sys/* /proc/* /private/etc|var/* + $HOME/.ssh|.aws|.gnupg|.kube|.docker/*). Escape hatch: IDD_EDIT_HELPER_ALLOW_UNSAFE_BODY_FILE=1. Wired into both eq + space form --body-file branches. # Documentation (HIGH) H9: CHANGELOG line 58 corrected — removed false walk-up config claim. Now explicit: '$REPO currently respects --repo flag only; walk-up resolution deferred to future enhancement'. # Logic (MEDIUM) M-R2-1: validate-target null guard extended to also catch empty string from jq parse failure (malformed mock JSON). Added 2>/dev/null on jq calls. M-R2-2: tr range tightened to \000-\037 (full ctrl chars including TAB and CR); comment corrected. M-R2-3: CHANGELOG entry documents marker format change (values now double-quoted: mode="X" not mode=X). Downstream parsers should accept either form. # Architectural gap (deferred) M-R2-4: SKILL.md↔helper integration test layer gap filed as follow-up #163. 3 reviewers independently flagged + R1 B1/B2 + R2 H7 are real-world evidence. Deferred from R3 to avoid scope inflation per DA recommendation; P2 priority. # Doc drift (LOW) N1-3: sed pass over SKILL.md L72/125 + test.sh L2 + spec.md L7/116/162 replacing stale '.claude/scripts/' refs with 'plugins/issue-driven-dev/scripts/' (matching C1 helper move). Fixture count refs updated 13→23. # Test gate bash plugins/issue-driven-dev/scripts/tests/idd-edit/test.sh → Results: 23 passed, 0 failed # Tests added in R3 - 20-audit-marker-quote-injection (H6 quote escape) - 21-mock-requires-test-mode (H8 gate refuses without TEST_MODE) - 22-body-file-refuses-etc (H10 sensitive path refuse) - 23-body-file-escape-hatch (H10 safe path /tmp/ allowed) Refs #154 Refs #150 Refs #97 Refs #163
Verify Report — PR #159 ROUND 3 (Issue #154)Engine5 general-purpose Agents + Codex (gpt-5.5 xhigh) = 6-AI ensemble, all 6 returned with findings on first attempt (no rate-limit recovery needed — same as R2). Aggregate🔴 FAIL — DO NOT MERGE (Plan R6 escalation criterion empirically satisfied)
5-way confluence on H10 (strongest verify signal of session)Logic + Security + Regression + DA + Codex ALL independently arrived at the same conclusion: R3 H10
Only the literal R3 fix status
Score: 5 FIXED / 4 PARTIAL or NOT FIXED / 1 DEFERRED。 NEW Findings (introduced by R3 fix loop OR exposed by deeper probes)
Meta-Findings (Plan R6 escalation criterion)
Test gate (still GREEN — but more misleading than R2)23/23 GREEN. But:
The test suite is now systematically miscalibrated — it tests what was fixed, not what's actually broken。 Scope checkNo file-list scope creep. Test suite growth is intended scope expansion + reasonable per fix。 Recommendation (3rd time pivot point — same as R2 but stronger signal)User's options at this verdict moment: Path (a): R4 strict-scope fixes (~45 min, may converge)
Path (b): R4 fix-all-in-one (~1.5-2h, NOT recommended)
Path (c): ESCALATE to #155 (recommended by DA + Plan R6) (~10min to revert + 1h #155 evaluation)
Path (d): Merge with H10 caveat doc + file follow-ups (~10min, NOT recommended)
Engine note (R3 verify discipline)R3 had no Agent rate-limiting (clean first-attempt run, same as R2). All 6 reviewers produced findings on first attempt. The 5-way confluence on H10 bypass is the strongest verify signal ever surfaced in this session — strongly suggests escalation is warranted。 R3 verify cost: ~10min compute, surfaced ~7 NEW findings (5 confirmed via live POC). R3 fix cost was ~1 hour. R3 verify provides ~10x ROI on engineering time by preventing a partially-fixed merge from shipping. This is the verify ensemble doing its job. The empirical evidence over 3 rounds:
DA-R3-M6 recommendation is strongest: escalation to #155 is now the recommended path。 Continuing fix loops within bash layer = sunk-cost fallacy。 NextUser decides path (a) / (b) / (c) / (d). Re-verify after fix OR escalate to #155. |
|
Closed without merging per #154 escalation decision to #155 (Plan R6 criterion empirically satisfied across 3 verify rounds). See #154 (comment) for full rationale. Feature branch idd/154-edit-runtime preserved as audit trail. Critical-path now #155 (alternative-layer evaluation). |
Refs #154 Refs #150 Refs #155 Refs #156 Refs #157 Refs #158
Summary
Closes (Refs #150 — literal-letter form per discipline) Requirements 4 + 5 runtime enforcement deferral, addressing R1/R2/R3 bash-incremental failure on PR #153 (3 verify iterations each introduced new parser bugs).
Approach (per #154 D1-D3 plan-locked decisions):
idd/154-edit-runtime)Mid-impl pivot per
.claude/scripts/tests/spectra-archive-post-ic/precedent: extract parser to.claude/scripts/idd-edit-helper.sh(proper extracted helper) instead of inline SKILL.md bash. This solves R1/R2/R3 root cause more deeply — AI no longer generates parser bash inline.What lands
.claude/scripts/idd-edit-helper.sh.claude/scripts/tests/idd-edit/.../skills/idd-edit/SKILL.md.../skills/idd-comment/SKILL.mdopenspec/specs/append-vs-modify-discipline/spec.mdplugin.jsonCHANGELOG.mdTest gate proof
BREAKING (runtime)
/idd-edit --replacewithout--scope/--sectionnow refuses (exit 3 + R4 message)/idd-editmodifying non-OWNER non-bot comment now refuses (exit 4 + R5 message);/idd-commenterrata flow auto-call handles gracefullyChecklist
ce0adf4)/idd-verify --pr <this-PR>)/idd-close #154after mergeFiled mid-impl
<!-- @trace -->blocks lack auto-update mechanism (tangential from #154 plan) #157 — spec.md<!-- @trace -->blocks no auto-updater (parking-lot P3)/idd-editbatch mode + R5 interaction semantics (P2; single-target enforcement shipped, batch interaction follow-up)Generated by /idd-implement on PR path. Do NOT add a GitHub close trailer (Closes/Fixes/Resolves) — IDD discipline requires manual /idd-close after merge to enforce checklist gate + closing summary.